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Abstract 

Purpose. We previously proposed that Hebbian adjustments that are incompletely synapse specific ("crosstalk") 
might be analogous to genetic mutations. We analyze aspects of the effect of crosstalk in Hebbian learning using 
the classical Oja model. 

Methods. In previous work we showed that crosstalk leads to learning of the principal eigenvector of EC (the 
input covariance matrix pre- multiplied by an error matrix that describes the crosstalk pattern), and found that 
with positive input correlations increasing crosstalk smoothly degrades performance. However, the Oja model 
requires negative input correlations to account for biological ocular segregation. Although this assumption is 
biologically somewhat implausible, it captures features that are seen in more complex models. Here, we analyze 
how crosstalk would affect such segregation. 

Results. We show that for statistically unbiased inputs crosstalk induces a bifurcation from segregating to non- 
segregating outcomes at a critical value which depends on correlations. We also investigate the behavior in the 
vicinity of this critical state and for weakly biased inputs. 

Conclusions. Our results show that crosstalk can induce a bifurcation under special conditions even in the 
simplest Hebbian models and that even the low levels of crosstalk observed in the brain could prevent normal 
development. However, during learning pairwise input statistics are more complex and crosstalk-induced bifurca- 
tions may not occur in the Oja model. Such bifurcations would be analogous to "error catastrophes" in genetic 
models, and we argue that they are usually absent for simple linear Hebbian learning because such learning is 
only driven by pairwise correlations. 
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1 Introduction 



1.1 Background 

Learning is thought to occur as a result of changes in synaptic strength triggered by pre- and postsynaptic neural 
activity, in a "Hebbian" manner. Such changes are not completely specific to the synapses at which the activity 
occurs [231 [S], because of inevitable albeit minimal second-messenger diffusion. 

Oja |33j showed that a simple model neuron could perform unsupervised Hebbian learning of the first principal 
component of an input distribution. In this model, unlimited weight growth is prevented using an additional term 
in the learning rule, producing an implicit, "multiplicative" weight normalization (30] . Biological synapses do show 
Hebbian properties, using well-understood, spike-coincidence detection machinery, raising the possibility that real 
neurons can exhibit similar unsupervised learning. Finding principal components could be very useful in the brain for 
data compression and transmission, since for Gaussian data such representations have statistically optimal properties, 
and often neural signals are approximately Gaussian. Furthermore, representational learning often requires that 
inputs be pairwise decorrelated. Hebbian learning can also explain developmental changes, such as the segregation 
of visual input to central neurons. 

Recent data suggest [23l [7J [9] that weight updates may be affected by each other, for example due to unavoidable 
residual second messenger diffusion between closely spaced synapses. We have suggested that such crosstalk is 
analogous to mutation in genetics, and that cortical circuitry may be specialized to reduce it. However, it is not clear 
that learning would be subject to an "error catastrophe" such as that occurring in genetic systems [H]- If complete 
learning failure does not occur at a critical, low, crosstalk level, such circuitry might not be necessary. 

In a recent paper |36j we examined how crosstalk would affect the Oja model. We considered a learning network 
consisting of a single output neuron receiving, through a set of n input neurons, n signals x = (x\, ...,x n ) T drawn 
from a probability distribution 'P(x), x £ W 1 , transmitted via synaptic connections of strengths u> = (wi, ...,uj n ) T . 
The resulting scalar output y was generated as the weighted sum of the inputs y = x T o;. 

The synaptic weights u>i were modified in accordance with Oja's rule of learning, by implementing first a Hebb-like 
strengthening proportionally with the product of Xi and y (with small constant of proportionality, or learning rate, 

7) 

u>i{t + l) =oj i {t)+'yy(t)xi(t) 

followed by an approximate "normalization" step (applicable for small 7 and ||w|| close to one), maintaining the 
Euclidean norm of the weight vector u) — (tox, ...,uj n ) T close to one. 

u(t + 1) = w(t) + 7V(*)[x(t) - y(*)w(t)] 

We considered the long-term average of this Oja equation, using the input covariance matrix C = x T x as an 
appropriate long-term characterization of the inputs, and studying the behavior of w(t) = (u)(t + l)\u)(t)}: 

= 7 [Cw — (w T Cw) w] 

in continuous time, or: 

Aw = 7 [Cw — (w T Cw) w] 

as a discrete time approximation. 

We then introduced inspecificity into the learning equation |36j . We implemented this inspecificity by assuming 
that, on average, only a fraction q of the intended update reaches the appropriate connection, the remaining fraction 
I — q being distributed amongst the other connections (according to a rule which we defined according to plausible 
underlying biology) . The quality factor q is analogous to a similar factor in molecular evolution theory that represents 
the fidelity of single-base copying |41j . The actual update at a given connection thus includes contributions from 
erroneous or inaccurate updates from other connections. The erroneous updating process was formally described by 
an error matrix £, independent of the inputs, whose elements, which depend on average on q, reflect at each time 
step t the fractional contribution that the activity through the connection with weight makes to the update of uij . 

u)i(t + I) = u)i + 72/([£x]j - yuji) 
The discrete long-term statistics can be then written in matrix form as: 
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Aw = 7 [ECw — (w T Cw) w] 

where the "error matrix" E = {£) is a symmetric matrix with positive entries, which equals the identity matrix 
/ £ Ai n (M.) in case of perfect quality updates. Then the rule changes into: 



~dt 



7 [ECw — (w T Cw) w] 



(1) 



Throughout the paper, we call this the (inspecific) Oja rule with continuous time updates. 
We studied the asymptotic behavior of this n-dimensional system, starting with a local linear analysis of the equilibria 
and their stability. Although this rule is nonlinear, the Hebbian update term is linear in the output, and we sometimes 
refer to this, and related, rules, as being "linear," in contrast to other Hebbian rules [25J 24, BJ 34| [T5J EU [10] which 
are nonlinear in the output. 

Note that the symmetric, positive definite matrix C £ A4 n (R) defines a dot product between any two vectors 
w and v in R™ as (v,w)c = v T Cw. Although both C and E are symmetric, the product EC is not symmetric in 
the Euclidean metric. However, in a new metric defined by the dot product (•, -)c, EC is symmetric: (ECu, v)c = 
(ECu)'Cv = u*C*E*Cv = u*CECv = (u,ECv) c , for all u,v £ R". Hence EC has a basis of eigenvectors, 
orthogonal with respect to the dot product (•, -)c- The following is immediate: 

Description of equilibria of the system ([l]). An equilibrium for the system is any vector w = (wi...w n ) T such 
that ECw = (w T Cw)w, i.e., an eigenvector o/EC (with corresponding eigenvalue A W J, normalized, with respect to 
the norm ||-||c = (v)c> so that ||w||c = -^w Generically, EC has a strictly positive, unique maximal eigenvalue, 
and the corresponding eigendirection is orthogonal in ((•, -))c to all other eigenvectors of EC 

For an equilibrium w of the system M, the Jacobian matrix Df^ around w is (see Appendix 1): 



Df* = 7 [EC - 2w(Cw) T 
Then we have the following (see Appendix 1 for proof): 



(w T Cw)ll 



(2) 



Stability criteria for equilibria. Suppose EC has a multiplicity one largest eigenvalue. A normalized eigenvector 
w is a local hyperbolic attracting equilibrium for iff it corresponds to the maximal eigenvalue of EC. 

Such attractors always exist provided EC has a maximal eigenvalue of multiplicity one, which is generically true. 
Then the network learns, depending on its initial state, one of the two stable equilibria, which are the two (opposite) 
maximal eigenvectors of the modified input distribution, normalized so that ||w||c = A w . It can be shown easily that 
these two attractors (the appropriately normalized eigenvectors corresponding to the maximal eigenvalue of EC) can 
be the only attractors in the system (see Appendix 2 for proof). 



In a previous paper |36j . we further analyzed the sensitivity of the system under variations of parameters, for some 
biologically plausible forms of the covariance and error matrices: 



v + 5i c 
c v + S 2 



v + 6 n 



and E 



q e 



where the input covariance matrix had uniform covariances c > and variance biases 8\ > 62 > ■ ■ • > S n ; the error 
matrix was defined such that q > e > 0, q + (n — l)e = 1. Our analysis of this system concluded that the effect of 
biologically realistic levels of crosstalk would typically only produce small gradual changes in the learning process, 
though when inputs carry very similar signals, the effects could be more dramatic. In this paper we explore this 
"very similar" scenario more thoroughly. In particular we describe the effect of crosstalk in the special "unbiased" 
case, where the inputs have identical statistics. 



1.2 Biased and unbiased inputs 

Our previous analysis considered only distributions of inputs with a bias in the covariance matrix (we imposed the 
condition that EC has a leading eigenvalue of multiplicity one). While this case is mathematically generic, previous 
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work using related models (without crosstalk to study learning in the visual system, often assumed that the 
input statistics are "unbiased," or identical for each input (for example, because inputs from corresponding points in 
the left and right eyes see the same point in visual space). It is well known that in the two-dimensional case, if the 
two inputs X\ and are positively correlated (as one might anticipate for active vision), linear Hebbian learning 
does not predict the observed developmental segregation of visual afferents [121 [TOl SH HS] i negative correlations 
(or a nonlinear rule) are required. However, modifications in learning rules, for example subtractive normalization 
[22 HS1 EU HZ] , a weight-dependent rule [TS] or a BCM rule [TU], although not always originally developed to explain 
segregation, can overcome this difficulty. Often these rules lead to Hebbian learning driven by a modified version of 
the covariance matrix. In the current work, we examine the dynamics of Oja learning with crosstalk when inputs 
are unbiased, and how this changes when a slight bias is introduced. 

We show here that in the unbiased negative correlation case, the system undergoes a bifurcation in dynamics 
at a critical crosstalk level. Related results have been obtained by fTTJ [TT] . While there is no true bifurcation in 
the near-unbiased case, the very dramatic change in learning that occurs over a small error range is biologically 
indistinguishable from a true bifurcation. We discuss our results in relation to models of development and learning. 



1.3 Our current model 

In this paper, we will consider the continuous-time, two-dimensional nonlinear rule of Oja (i.e., for two input 
channels and one output), with covariance matrix C and error matrix E symmetric matrices having the forms 

C = ( V ^ C ) and E = | ^ ^ ^ ) . The parameters are such that 1/2 < fl < 1, v > and c < 0, such 
\ c v J \ l-q q ) * 

that v > \c\, v > \S\ and v(v + S) > c 2 (i.e., det(C) > 0). The 2D system expands to: 

Wi = [q(v + S) + (1 — q)c]wi + [qc + (1 — q)v]w 2 — [vw\ + 2cw 1 K7 2 + vw 2 ]wi 

u> 2 = [(1 — q)(v + 5) + qc]wi + [(1 — q)c+ qv]w2 — [vw 1 + 2cwiW2 + vw 2 ]w2 (3) 

The rest of the paper is centered around this 2-dimensional model. In Section 2, we establish the mathematical 
background of the model's behavior. We analyze some of its local and global dynamics, observe the dependence of 
these dynamics on parameters and discuss bifurcations. One of the phenomena central to our interest is how the 
behavior of the system changes when the bias parameter S varies, in particular when it approaches zero (i.e., the 
inputs are very close to a perfectly unbiased state). In Section 3, we discuss the results in the context of visual 
modeling and ocular segregation of inputs. 

2 Linear analysis of the 2D dynamics 

We notice that the phase plane of the system is symmetric about the origin (i.e., if w(t) is a solution curve for the 
system, then —w(t) is as well). The trace, determinant and eigenvalues of EC can be obtained easily as expressions 
of the system parameters: 

det(EC) = det(E) det(C) = (2q - l)[v(v + 6) - c 2 ] > 

tr(EC) = 2(1 — q)c + q(2v + S) > (from the Cauchy-Schwartz inequality). 

Lemma 2.1. For all parameter values, EC has two real eigenvalues flip, which are distinct unless the conditions 
v 

5 = and q = q* = are simultaneously satisfied. More precisely, when /ii 7^ /12, we have: 

v — c 



2(l-q)c + q(2v + 6) + VA . . -qS + y/A 
/i! = larger eigenvalue, with eigenhne of slope Z\ = — 

/i2 = — ^ ' ^ ^ — — — smaller eigenvalue, with eigenline of slope 22 = — ^ ^ 

where (3 = qc+{l- q)v and A = [2qc + (1 - q)(2v + S)} 2 + (2q - 1)6 2 . 

Proof. The calculation of eigenvalues and eigenvectors is immediate from the characteristic equation of EC: 
X 2 - tr(EC)X + dct(EC) = 0, with discriminant 
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A = tr(EC) 2 - 4 det(EC) = [2qc + (1 - q)(2v + S)} 2 + {2q - 1)S 2 

Notice that A > 0, with equality A = (i.e., double eigenvalue for EC) iff both 8 = and /3 = 0. The critical 

v 

quality value (where the two eigenvalues are equal, producing a switch in the dynamics when 8 = 0) is q* = . 

i—i v — c 

Since v > |c|, this value occurs within the appropriate q range, (1/2, 1] (see Figure 




Figure 1: Evolution of the eigenvalues as the quality q is varied, in three different 8 slices. 8 = 0.2 (A), 
8 = (B) and 8 = 0.5 (C). Fixed parameters: v = 1 and c = -0.4, hence q* = 1/1.4 ~ 0.71. When 5 = 0, the 
eigenvalues /zi and /i2 touch at q = q* . For 8^0, the two curves avoid this crossing; the minimal distance between 
them occurs at q = q* , but it is strictly positive. 



2.1 Equilibria of the 2D system 

Throughout this section, in addition to working in our generally specified parameter ranges, we will assume that EC 
has distinct eigenvalues (i.e., 8 ^ or q ^ q*). In this case, the system has as equilibria the origin w = 0, and two 
pairs of opposite eigenvectors of EC normalized such that w'Cw = fi (where ji is the respective eigenvalues of each 
pair). 

The normalization condition can be written as: 

w'Cw = (v + 8)w 2 + 2cw\W2 + vw\ = [i 
Using the same notation z = w^/wx, this can be rewritten as vz 2 + 2cz + (v + 8) = n/w 2 , so that 



/i(z 2 



vz 2 + 2cz + (y + 8) 



Thus the norm varies with both error and correlation. The position and stability of the four nonzero equilibria 
vary with the parameters v, c, 8 and q. If we aim to study the sensitivity of the system's dynamics under parameter 
perturbations, the next step should be establishing the linear stability of these equilibria; this follows directly from 
the general results in Section 1: 

Description and stability of equilibria. Suppose the matrix EC has distinct eigenvalues. The system has 
five distinct equilibria, w = and four normalized eigenvectors of EC . The two ( opposite ) eigenvectors of the larger 
eigenvalue are hyperbolic attractors, and the two (opposite) eigenvectors corresponding to the lower eigenvalue are 
saddles. The origin is repelling. 

More precisely, this means that if /z w is the larger eigenvalue of EC, the Jacobian matrix D w has two negative 
eigenvalues, hence w is an attracting node. If instead /z w is the smaller eigenvalue of EC, then D w has two real 
eigenvalues of opposite signs, and w is a saddle equilibrium. 



We are particularly interested in the behavior near and at 8 = 0. The above characterization of equilibria applies 
when 8^0, but it breaks down in the parameter slice 8 = 0, at the critical point when EC has a double eigenvalue. 
In other words we expect that the system undergoes a bifurcation in the unbiased 8 = slice, which does not exist 



5 



in the other, 5^0 slices (i.e., when "bias" is present in the inputs), therefore we will study this case separately. 

For the following paragraph (Section 2.2) we assume 5^0. The unbiased case 8 = is discussed separately in 
Section 2.3. The results are integrated and concluded in Section 2.4. 



2.2 More properties of the phase plane 

One way to describe the dynamics of the system, including the more global aspects and possibly cyclic behavior 
(which has not yet been excluded) is to follow the rotational direction of the solution trajectories in different regions 
of the (wi,w 2 ) phase-plane under the velocity field (u)i,w 2 ). 

Consider the angle 9 G [—it/2, tt/2] made by the direction (u>i, w 2 ) with the W\ axis. As before, call z — W2/1V1 — 
tan(0) and ft = qc + (1 — q)v. Then, along a trajectory in the (w\, w 2 ) plane, 



d ( W 2 \ W2Wi-W lW 2 u , u . , 2 

- = [(1 - q)(v + 8) + qc\ - qoz - [(1 - q)v + qc\z 



dt \ wi J w\ 

= -f3z 2 - qSz +[13 + (I- q)6] 

We first want to establish if there are any values of z for which z — 0. These are the slopes along which the 
rotational speed of the trajectories is zero; in other words, they would correspond to invariant lines in the phase-plane. 

We consider the quadratic equation: z — —flz 2 — qSz + [/3 + (1 — q)S] = 0. The discriminant is the same as the 
one of the characteristic equation of EC: 

A = q 2 S 2 + 4p[f3 + (l-q)5} = [2qc+(l-q)(2v s )} 2 + (2q - l)S 2 
The solutions of the quadratic equation will the be exactly the slopes of the eigendirections of EC: 

-qS± 

1,2 = 2/3 [-0°,+°°] 

proving the following: 

Lemma 2.2. The eigendirections of the matrix EC represent invariant lines under the vector field of system (1). 

We want to better describe the phase-plane behavior between the invariant lines z — z\ and z = z 2 . For any 
fixed g€ (1/2, q*) U (q*, 1] (i.e., for /3 7^ 0), the rotational speed is given by the sign of the quadratic function 
f(z) = —j3z 2 — qSz + [(5 + (1 — q)S]. In principle, we then have two situations: 

i. q G (1/2, q*) (i.e., (3 > 0). Then Z\ > z 2 , with z > in (z 2 ,zi) and z < on (— 00, z 2 ) U (^liOo)- The phase 
plane looks schematically as in Figure [2^,. 

ii. q G (q* , 1] (i.e., f3 < 0). Then Z\ < z 2 , with z < in (zi, Z2) and z > on (— 00,2:1) U (z 2 ,oo). The phase plane 
looks schematically as in Figure [2Jd. 

In other words, all trajectories move asymptotically towards the invariant line z = Z\. 

Since the behavior of the system seems to a large extent dictated by these invariant lines, we study how the 
positions of these lines change under variations of the quality parameter q. In other words, we want to study the 
monotonicity of z\ = z\(q) and z 2 = z 2 (q). We get the following (for detailed proofs and limit-case behavior lim z\ 2 , 

q^q* ± 

see Appendix 3; for illustrations see Figures [2] and [3} 

Proposition 2.3. If 8 < 0, then — > and hence both z\ and z 2 are increasing as q G (1/2, q*) U (q*,l\. In 

dq 

the system's phase plane, this corresponds to a continuous counter-clockwise rotation of the two invariant lines. If 
dz\ 2 

S > 0, then - ' < 0; hence both z\ i2 are in this case decreasing as q G (1/2, q*) U (q* , 1]. In the phase plane, this 
corresponds to a clockwise rotation of the invariant lines. 

Proposition 2.4. The angles B\ 2 G [—tt/2, tt/2] between each invariant line and the wi abscissa are decreasing with 
respect to the parameter q in case 8 > 0, and are increasing with respect to the parameter q in case 8 < 0. Moreover, 
in both cases, the angular rate of change is finite, at all q G (1/2, 1]. 



G 




Figure 2: Invariant lines and generic phase plane dynamics. The invariant lines are marked as z%(q) and 
Z2(q). The arrows indicate the rotational direction of the vector field between the two invariant lines. This can 
be obtained in the right vertical half-plane (where we have defined our angle, 6 E [— 7r/2, 7r/2]), then extended by 
symmetry in the opposite half-plane. For q < q* we have z\ > Z2 (A.). As q increases, the two invariant lines rotate: 
clockwise if 6 > and anti-clockwise if S < 0. At q = q* , one of the invariant lines goes through a vertical stage. 
For S > 0, 02 jumps from —tt/2 to 7r/2, hence Z2 has a vertical asymptote at q = q* , and jumps from Z2 — > — oo 
to Z2 — > oo. For 6 < 0, B\ jumps from tt/2 to —tt/2, hence z\ has a vertical asymptote at q = q* , and jumps from 
z\ — > oo to z\ — > — oo.In consequence, after this critical stage, for q > q* , we have z\ < Z2 (B.) Although the rotation 
is continuous, either z\ or Z2 has an infinite discontinuity, due to our definition (mod tt) of the angles 6±, 2. 




A. B. C. 



Figure 3: Transitions of the phase plane and bifurcation at q = q* , in the slice 6 = 0. A. When q > q* , 
the stable equilibria are the two vectors of norm \J q — 1/2 (blue dots) along the invariant line of slope z\ = — 1; 
the saddle equilibria are the two eigenvectors of norm 1 (green dots) along the invariant line of slope z% = 1. As 
q decreases from q = 1 towards q = q* , the saddles remain unchanged, but the attractors gradually approach the 
origin (their norm \ fq — 1/2 decreases). B. When q = q* , the system traverses a bifurcation state, characterized by 
an infinite number (an entire ellipse) of neutrally stable equilibria. This critical state permits the swap of stability 
between the two invariant lines. C. When q < q* , the stable equilibria are now the two vectors of norm 1 (blue 
dots) along the invariant line of slope z± = 1, while the saddle equilibria swapped to the two eigenvectors of norm 
sqrtq — 1/2 (green dots) along the invariant line of slope Z2 = — 1- As q continues to decrease from q = q* towards 
q = 1/2, the attractors remain unchanged, and the saddles approach the origin (collapsing into the origin in the limit 
of q -)• 1/2). 
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2.3 Unbiased case 5 = 



For 5 = the computations are simpler; however, as mentioned before, the system has an interesting critical transition 
which does not appear in the S ^ slices (occurring from the "touching," or apparent crossing, of the two eigenvalues 
at q = q* , as shown in Figure [l]) . 

Proposition 2.5. Suppose 5 = 0. The phase plane of the system depends on the value of q as follows: 

i. If q < q* , then \i\ = v + c is the larger eigenvalue, with eigendirection Z\ = 1 and norm of the corresponding 
attracting equilibria \\w\\ = 1. [i2 — (2? — l)(u — c) is the smaller eigenvalue, with eigendirection Z2 = —1 and 
norm of the corresponding saddle equilibria \\w\\ = \fq~— 1/2. 

ii. If q > q* , then /ii = (2q — l)(v — c) is the larger eigenvalue, with eigendirection z\ = —1 and norm of the 
corresponding attracting equilibria \\w\\ = y/q — 1/2. /12 = v + c is £/ie smaller eigenvalue, with eigendirection 
Z2 = 1 and norm of the corresponding saddle equilibria \\w\\ = 1. 

m. // q = q* , the system contains an infinity of half- stable non-isolated equilibria (each direction will contain two 
opposite equilibria, describing overall an ellipse of equilibria around the origin). 



Proof. For <5 = 0, we have z = —f3(z 2 — 1). The situation q < q* corresponds to f3 > 0, and q > q* corresponds to 
P < 0. Parts i, and ii. follow immediately. For q = q* , z = 0; all lines through the origin are invariant, and each 
contains two half-stable equilibria. In the Appendix 4, we show that the locus of these equilibria is an ellipse (see 
dotted curve in Figure and we describe its axes and foci. □ 



Remark 1. For 5 = 1, the attracting equilibria lay along the direction z = — 1, so that w\ + W2 = 0. A simple way 
to quantify how far the stable equilibrium w = (wi, W2) degrades from this error-free state as the quality q decreases, 
we can measure how much the sum S(q) = \wi + W2I deviates from zero, the outcome of perfect learning (Figure ffl. 





8 = 1 

—S = 0.3 
—S = 0.1 

— 5 = 









5 55 0.6 0.65 



A. 



7 75 oe 
Quality q 




B. 



5 55 6 0.65 0.7 75 8 8 

Quality q 



Figure 4: S(q) = \wi+W2\ as a measure of the increasing inspecificity of the stable equilibrium, compared 
to its ideal state 5(1) = 0, as q decays from q = 1. For v = 1, c = —0.4, we plotted S(q). A. For 5 > 0: S = 1 

(cyan); S = 0.3 (blue); S = 0.1 (purple); 6 = (red). B. For 6 < 0: 5 = -1 (cyan); 6 = -0.3 (blue); 6 = -0.1 
(purple); 5 = (red). In both panels, all continuous curves for 5^0 concur at one point, which corresponds to the 
fact that, for both 5 > and 5 < 0, the stable equilibrium at q = q* is independent on the magnitude of o\ 

In Figure [TJ), the inputs are unbiased (5 = 0), and in the absence of crosstalk (q = 1) the inputs segregate 
completely. As crosstalk increases, the separation between the eigenvalues at first decreases, though the inputs 
remain completely segregated. However, as crosstalk increases further, the two eigenvalues equalize at the critical 
quality value q* = v/(v — c). With further increases in crosstalk, the inputs become completely unsegregated, and 
the eigenvalues now move apart. This qualitative change at q* is a bifurcation. Note that although the qualitative 
behavior only changes at q* , there is a biologically less important quantitative change: the two symmetric equilibrium 
weight vectors decrease continuously in length as sj q — 1/2, as q decreases from q = 1 until the bifurcation at q = q* , 
then remain of unit length for q < q* . 
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In the slightly unbiased cases 8 = —0.2 and 0.5 (Figure [iji and c), this overall behavior persists, although 
the eigenvalues always remain distinct, and there is no true bifurcation. Thus in part A, as crosstalk increases, the 
eigenvalues at first approach each other, and the solution remains almost segregated. At the "pseudocritical" , value of 
q = ( 2 "+^H2^+^-2c)-(5 ^ ^ e e jg enva i ues reacn their closest value, (in an "avoided crossing") and then start to separate 
as crosstalk increases further; significantly beyond this pseudocritical value, the outcome is almost unsegregated (see 
Figures [4] and [5). Of course, for q values very close to this pseudocritical value, desegregation is very rapidly 
increasing with increases in crosstalk (see Figures [2] and [5]), especially with very small values of 5. Thus even 
with slight input bias, the overall behavior, switching from segregation to unsegregation at a critical crosstalk value, 
resembles that seen in the unbiased situation. 




Figure 5: Equilibria curves in the phase plane, as q changes. The blue curves represent the stable equilibrium 
locus, and the green curves the saddle equilibrium. A. Plots for a few representative positive 6 values: S = 0.02 
(thin curves), 6 — 0.2 (thin dotted curves) and 6 — 0.5 (thick curves). All green saddle curves concur at one point 
(on the vertical axis), and all blue stable curves also concur at a point, corresponding to the fact that the position 
of the two equilibria is independent on the magnitude of S > 0. B. Plots for a few representative negative 5 values: 
6 = —0.02 (thin curves), S = —0.2 (thin dotted curves) and 5 = —0.5 (thick curves). All green saddle curves concur 
at one point, and all blue stable curves also concur at a point (on the vertical axis), corresponding to the fact that 
the position of the two equilibria is independent on the magnitude of <5 < 0. The arrows along the curves indicate 
the direction of increasing q. 



2.4 Conclusions: mathematical behavior of the 2D system 

Corollary 2.6. For any combination of parameters, the phase-plane of the system (1) contains no cycles. Moreover, 
the system has only two (opposite) attracting equilibria, with attraction basins two open half-planes. 

Remark. The result holds more generally for an n-dimensional system, as shown in Appendix 1. 

Since we are looking at a 2-dimensional system, this means, according to the Poincare-Bendixon theorem, that 
the only attracting sets can be attracting equilibria. The two attracting equilibria of the system (by Proposition 2.3) 
lie along the invariant line corresponding to the largest eigenvalue of the covariance matrix C, hence their position 
(direction and distance to origin) depend on the values of the parameters (in particular on the quality q and bias 
factor 5.). Figure [5] illustrates the evolution of these points in the phase plane for a fixed S ^ 0, as q increases. (We 
used Matcont continuation algorithms to numerically estimate the equilibria and draw the equilibrium curves.) 

The following two paragraphs summarize the conclusions obtained throughout the previous sections: 

Biased dynamics. When the system is biased (i.e., S ^ 0) the two eigenvalues of the input covariance matrix 
C are always separated. The phase plane has two pairs of nonzero opposing equilibria, each situated on one of 
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two distinct invariant lines through the origin (i.e., the two eigendirections of EC). The invariant line of slope z\ 
corresponding to the higher eigenvalue \i\ of EC contains the pair of opposing attracting equilibria; the invariant 
line of slope Z2 corresponding to the lower eigenvalue /12 of EC separates their two basins of attraction and also 
contains the pair of opposing saddles. As the parameter q increases, the invariant lines rotate (clockwise if S > and 
counter-clockwise if 5 < 0) in a continuously differentiable manner, with an angular speed that depends on q. This 
rotation gets arbitrarily fast (e.g., at its point of maximal rotational speed) as 6 — > 0. 

Unbiased dynamics. When the system is unbiased (i.e., 5 — 0) the two eigenvalues of the input covariance 
matrix C collide at the critical value of the quality parameter q — q* . For any q 7^ q* , the phase plane has two pairs 

of nonzero opposing equilibria, each situated on one of two distinct invariant lines ^ J through the origin. The 

invariant line corresponding to the higher eigenvalue of EC contains the pair of opposing attracting equilibria; the 
invariant line corresponding to the lower eigenvalue of EC separates their two basins of attraction and also contains 
the pair of opposing saddles. As the parameter q increases, the invariant lines remain unchanged, until they swap 
instantaneously as q traverses the critical state q — q* (stability-swapping bifurcation^. At the bifurcation point, 
the phase plane has an entire ellipse of half-stable equilibria. 

Remark. The codimension 2 bifurcation that occurs at q — q* in the slice 6 — can be considered a limit 
case of the phase-plane transition sequence obtained when increasing q, when making 5 — > in the biased case. The 
rotational speed blows up to 00 as 6 — > 0, and, in the 6 = slice, the rotation becomes instantaneous via what 
appears to be the bifurcation's "swap" of eigendirections. The evolution of the rotation speed with respect to q as 
6 — » is further illustrated in Figure [6j 




Quality 9 Quality q 



Figure 6: Illustration of the evolution of the angles #12 of the invariant lines with the abscissa, as q 
increases. In both panels, v = 1 and c = —0.4. A. 6 = 0.5; B. 6 = —0.2. The graphs of the functions are shown in 
thick lines, 9\ in blue and 2 in green. The graphs of the derivatives are plotted in thin lines, with d6i/dq in blue and 
d02/dq in green. On the graphs of the derivatives, we marked with a black star the points corresponding to q = q* , 
and with a bullet the points of extremum (the inflection points for 9\^, where the rotational speed is maximal). 



3 Alternative models: Euclidean normalization of weights versus the 
Oja model 

The Oja rule is an elegant and classical solution to the well-known problem that unconstrained Hebbian learning 
is unstable [131 [33] ■ It has the biologically appealing feature that it is local, although it does require, somewhat 
implausibly, that the "normalizing" adjustment is proportional to the current weight. We have shown that it is still 
useful when some crosstalk is present, although the stable norm, and the exact direction of the learned weight vector, 
changes. One can imagine various other ways, possibly involving "homeostasis" or "synaptic scaling" [43j [44] of 
promoting stability, and some studies invoke various combinations of these mechanisms. A less biologically plausible, 
nonlocal but extremely simple and highly effective method, which might capture features of any more plausible 
scheme and which works even for nonlinear rules, is to impose a specific norm after each weight vector update. Here 
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we examine how crosstalk affects such "explicit" or "brute" normalization. 

As before, Hebb's rule lies at the basis of the weight updates: Aw = 72/x, with y = w'x = x*w. 

In other words: w(n + 1) = w + 71/x. As in the Oja model, we can think of Hcbbian inspecificity being formalized 
as a stochastic error matrix £, so that, at each time step: 

w — > w + 7?/£x 

Taking expectation of both sides and re- naming w = (w) (the long-term average of the weight vector), C = (x'x) 
(the correlation matrix of the input distribution) and E = (£} (the average error matrix), we obtain the iteration: 
w — > w + 7(£xx t )w = w + 7ECW. 

We normalize to keep ||w|| = 1, and make no further approximations to implement this normalization biologically. 
We get the new iteration function that describes the average iterative process, with errors, becomes: 

,, , _ w + 7ECW 
/(Wj ~ ||w + 7 ECw|| 

where the "modified" covariance matrix is as before EC; unlike in the Oja case, EC is now involved in the nor- 
malization step as well. Notice that, since EC has positive eigenvalues, the matrix I + 7EC is nonsingular, hence 
/ is defined for all w G K™\{0}. The rest of the section is dedicated to discussing the position and stability of the 
equilibria of this new system, in whose case the direct normalization confines the trajectories to the unit circle. 

In order to slightly simplify the notation, we call A = EC, u = w + 7ECW and a = ||u||, notation which we will 
use whenever it is convenient. We want to see if the long-term evolution of w predicted by this model is comparable 
with the behavior of our stochastic, discrete simulations for case where the "ratio" normalization was replaced by a 
"subtractive" Taylor approximation of it (see also [5rJ] . 



The vector w is a fixed point of /(w) iff w + 7AW = aw, i.e. w is a unit eigenvector of A (with the Euclidean 
norm). To establish the stability, we compute the Jacobian matrix of / at each fixed point. 



Fix j G 1, n. Then, for any i j: 

dui d 
dwj dvjj 



(wi + 7[Aw]j) = jAij 



When i = j, we have similarly: 

S = ^7 K ' + 7[AwL ' ) = 1 + 7 ^ 



duj d 



Hence, overall: 



d 

"u|| 2 = 2v,j(l + "fAjj) + V^u^Ay = 2uj +27 y^mAij = 2uj +27[A*u] 3 - 



dwj 

In matrix form: 

d 



M u|| 2 = 27A*u + 2u (4) 



Now, fix i € 1, n. For j ^ i, we have: 

dfi _ jAij\\u\\ -u J ||u||" 1 [7A t u + u] 3 



dWj 



For j = i, we have: 
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dfi_ = (l+7^)||u||-^||u||- 1 [7A t u + ub 
dwi ||u|| 2 

Rewritten in matrix form: 

= -A — ^(7Uu*A + uu*) + -I=-(l— \wA ( 7 A + I) (5) 
where I is the appropriate size identity matrix. 

At any fixed point w, for which automatically ||w|| = 1 and Aw = A w w, where 1 + A w 7 = a), we have that: 

uu* = (w + 7Aw)(w + 7Aw)* = (1 + 2 7 A W + 7 2 A^)ww* = (1 + A w 7) 2 ww* 
The Jacobian at a fixed point w can be then simplified to: 



£4(I-ww*)(7A + I) (6) 



We calculate: 



-^-(w) = -(I-ww*)(7A w + l)w= -(w- w(w*w))( 7 A w + 1) =0 

(7W Q, (l 

Complete w to a basis of eigenvectors of A (not necessarily mutually orthogonal) . Let v^w any of the vectors 
in this basis (with eigenvalue A v ), and consider z = v [w'v]w the projection of v on the orthogonal complement 
of w. Then: 

( 7 A + I)(z) = ( 7 A V + l)v - ( 7 A W + l)[w*v]w 

Hence 

f^(z) = 1 ( 7 A V + l)(v - [w*v]w) = 1 ( 7 A V + l)z = z 
ow a a 7 A W + 1 

A normalized eigenvector w of EC is stable as a fixed point of the system if all the eigenvalues of the Jacobian ^ 
at w are less than one in absolute value : 



7-^v + l 



< 1 



7^ w + l 

Since all eigenvalues of EC are positive (recall that EC is diagonalizable with the dot product (-,-)c), this is 

equivalent to ^ v ^ - < 1, and thus to A w > A v for every v^w. 
7A W + 1 

In conclusion: The system has stable fixed points iff the modified correlation matrix EC has a maximal 
eigenvalue of multiplicity one. Then, a point w is a stable fixed point of the system iff it is a unit eigenvector of EC 
corresponding to the unique maximal eigenvalue of EC" . 

It is now clear that the phase space of this system, although not dynamically equivalent to the phase space of 
the corresponding Oja model, it is very similar. Disregarding the origin (which is not in the domain of one, but is 
a repelling fixed point for the other), the other fixed points have the same qualitative behavior (stability) for both 
systems, if assuming 7 sufficiently small. Moreover, the stability transitions occur at the same bifurcation points 
(where the eigenvalues of bfEC collide with each other), and the bifurcation phase-planes are themselves similar. 

In the case of the two-dimensional model discussed in this paper, the eigenvalue swap occurs as before when 

v 

q = q* = . For example, in the unbiased case 5 — 0, the bifurcation phase plane at q = q* again exhibits an 

v — c 

w 

ellipse attractor. Indeed, at the codimension 2 q = q* . the iteration function becomes: /(w) = - — -, which maps 

||w|| 

radially any w in the plane to the unit circle, and keeps it fixed thereafter. 

The next section shows phase plane simulations for both models, in the more realistic situation of stochastic 
weight updates, in discrete time and at at finite learning rate. 
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4 Stochastic models. Simulations and predictions 



Here we briefly study the more biologically realistic situation in which the weights update stochastically with a small 
finite learning rate 7, driven by each individual input, rather than by the mean statistics in the negligible learning 
rate limit. In particular wc study whether the convergence to eigenvector equilibria, and the transitions in dynamics 
between different values of the parameter q still occur as in the deterministic model. 
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Figure 7: Behavior of stochastic weight updates for a biased input distribution <5 7^ 0. A. A discrete 
input sample (N — 4000) was drawn out of an input distribution with v = 1, c = —0.4, 5 = 1, and used to update 
the weights. B. Depending on their initial state, the weight vector stabilizes towards small stochastic fluctuations 
around either one of the attracting equilibria (the pair of appropriately normalized eigenvectors corresponding to the 
larger eigenvalue of EC). C. The corresponding iterations are shown in the case of exact normalization at each step 
(fewer iterations are shown in this case, since more weights, all living on the unit circle, would obstruct the clarity of 
the figure.) In all three panels, the points were colored update-chronologically from red to blue. We used the critical 
quality q = 1/1.4 ~ 0.71. 

Our numerical simulations show, as expected, that convergence is conserved, in the following sense: when a pair 
of attracting equilibria exist for the deterministic system (i.e., EC has distinct eigenvalues), the discrete sequence of 
updating u) eventually stabilizes to small, stochastic fluctuations around one of these two equilibria (which are, as we 
recall, the appropriately normalized eigenvectors corresponding to the larger eigenvalue of EC). This is illustrated 
in Figures 8, 9a and 9c. In Figure [7J x is drawn out of a biased distribution of inputs (shown on the left), for which 
the two eigenvalues of EC are warranted to be distinct for any value of q, in particular for the value chosen here 
(q = q* = 1/1.4). In Figure IsT the inputs are unbiased, so the same remark applies only if q =^ q* . In Figure [7k, we 

^-.^^^-^ .: >, F 4,__ 

q < q* , in which the attracting vectors are ±—= I I . In both cases, the stochastic update settles to fluctuations 



V2 V 1 . 

about either one of these vectors, depending on the initial conditions. 

Figure [8)3 illustrates the unbiased case corresponding to the codimension 2 bifurcation in the deterministic dy- 
namics; that is, when q = q* . Recall that, in the deterministic phase-plane, this case was characterized by an ellipse 
of neutrally stable equilibria, so that each initial condition would converge radially towards a unique nonisolated 
equilibrium on this curve. This situation changes in the model driven by stochastic updates. An initial weight vector 
ui will quickly be attracted towards the ellipse; however, the orbit does not fluctuate around a particular point on 
the curve, but rather perpetually drifts along the curve, eventually covering densely the entire ellipse. 

5 Discussion 

We have proposed [TTJ [T] that a central problem for biological learning is that the activity-dependent processes 
that lead to connection strength adjustments cannot be completely synapse specific, because they must obey the 
laws of physics. This truism provides a new viewpoint: it raises the possibility that sophisticated learning, such as 
presumably occurs in the neocortex, is enabled as much by special machinery for enhancing specificity, as by special 
algorithms [I]. We have suggested that these plasticity errors are analogous to mutations, and that cortical circuitry 
might reduce such errors, just as "proofreading" reduces dna copying mistakes. In particular, it seems possible 
that the key to overcoming the curse of dimensionality that underlies difficult, and apparently almost intractable, 
learning problems lies not just in finding good approximations, architectures and techniques, but also in perfecting 
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Figure 8: Differences in stochastic behavior when q is varied, in the unbiased input case 6 = (compare 
with Figure [3]). A discrete number of input vectors x(i) = (a;i(t), xa(*)) are drawn from a distribution with 
covariance matrix C, with v = 1, c = —0.4 (so that the critical quality value g* = 1/1.4 ~ 0.71). The weights 
u= (u}i(t),u)2(t)), adjusting with a small learning rate 7 = 0.1, are plotted in the (wi,W2) plane, with the color of 
the points changing chronologically from red to blue. The top panels show the behavior of the Oja model, while the 
bottom panels, for corresponding parameters, show the behavior for exact normalization of weights at each step. A. 
For good transmission quality q = 0.85 > q* , u) is converging in the long term to a state of small fluctuations around 
either ±\J q — 1/2 (1 — 1) T , depending on the initial state. The plot illustrates the trajectories for two initial states, 
each stabilizing around one of these opposite eigenvectors. B. For critical transmission quality q — q* , oj converges 
to fluctuations around the ellipse of neutrally attracting equilibria, but will perpetually drift around, filling the 
ellipse, driven by input fluctuations from the mean statistics, without remaining asymptotically near any particular 
equilibrium state. C. For poor transmission quality q = 0.6 < q* , u> is converging in the long term to a state of small 

fluctuations around ±—= (1 1) T , depending on the initial state. The plot illustrates the trajectories for two initial 
v 2 

states, each stabilizing to stabilizing around one of these opposite eigenvectors. 



the relevant biological plasticity apparatus. Indeed, it seems possible that problems of survival and reproduction are 
so diverse that no single algorithm can solve them all, so that no "universal" or "canonical" cortical circuit would 
be expected. In these circumstances, as Rutherford once said about physics, neuroscience would become a type of 
stamp collecting. However, if every specialized algorithm relies on extraordinarily specific synaptic weight adjustment, 
then finding machinery that allows such specificity would indeed be tantamount to discovering new neurobiological 
general principles, somewhat along the lines that established the main framework for modern biology (Darwinian 
evolution, Mendelian genetics, DNA structure and function, replication mechanisms etc). We have speculated that 
an important part of such machinery, at least in the neocortex, might lie outside the synapse itself, in the form 
of complex circuitry performing a proofreading operation analogous to that procuring accuracy for polynucleotide 
copying (TJ [4l [2] . However, such machinery would be less necessary if update inaccuracy merely degraded learning, 
rather than preventing it. In particular even if temporarily unfavorable (e.g., "noisy") input statistics led to imperfect 
learning because of Hebbian inspecificity, the degraded weights might still be a useful starting point for better learning 
when input statistics improve. On the other hand, if inspecificity completely prevented even partial learning, then 
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rapid and successful learning from newly favorable statistics might be impossible. These considerations have impelled 
us to examine the effect of Hebbian "crosstalk" in various classical models of unsupervised learning, using both 
linear [36] and nonlinear rules [11] (see also [17]). 

5.1 Separate but equal: segregation without bias 

In this paper we extended our previous study [3S] of the effect of crosstalk on the simple linear Hebbian model 
of Oja to situations approaching the "unbiased" case where all inputs have the same statistical distribution. This 
case has often been invoked in discussions of the emergence of ocular dominance wiring and other forms of neural 
development, but it might also apply to any situation in which sets of inputs disconnect completely, or "segregate," 
to form pruned wiring patterns that are then "sculpted" by a more subtle synaptic learning process (of course in the 
present model weights and activities can be negative; we interpret negative weights as disconnections). For the case 
of visual input, it seems likely that statistics would be similar, and positively correlated, for the two eyes, which look 
at the same world, and it is well known [13] that a linear Hebb rule with unbiased inputs, under either implicit or 
explicit normalization, leads to the symmetric, equal-weight, and thus apparently unbiological, outcome. A possible 
but rather unbiological solution to this is to use a "subtractive" normalization scheme, although this also requires 
imposing weight limits [32 . It has been shown that a wide variety of nonlinear rules [15] . including the BCM rule [8] 
and STDP [TB] can lead to ocular segregation under unbiased statistics. The key point is that segregated states can 
be created by typically nonlinear, "symmetry-breaking" mechanisms even when the inputs themselves do not favor 
particular segregated outcomes. Indeed, the absence of bias could be characteristic of development, as opposed to 
"learning," insofar as these two notions are distinct. 

A natural question would be: if such segregated outcomes are an important part of normal development (which 
then constrain subsequent, more detailed, "refining," plasticity processes, including learning), how could the deter- 
mining "unbiased" statistics arise, and conversely, how would plasticity errors, such as crosstalk, or other alterations 
in the form of the rule, affect the outcome? In particular we show here that, unsurprisingly, crosstalk tends to 
prevent segregation, especially when the inputs are close to unbiased. This might set a limit to the use of symmetry 
breaking to generate specific wiring, or require special specificity-enhancing circuitry, such as "proofreading", even 
during development. At the very least it suggests that internally generated patterns deriving segregation, such as 
negative correlations, might have to be quite strong to overcome the desegregating effect of inevitable crosstalk. 

Before exploring this further, we comment briefly about "unbias" in relation to Hebbian learning. Although here 
we focus on lack of bias in the second order statistics, one can also postulate unbias at all order, an assumption 
which greatly simplifies the study of nonlinear Hebbian plasticity, essentially eliminating the possibility of learning 
and restricting analysis to development. To what would unbiased high-order statistics correspond? It seems that 
they correspond to the radially symmetric distributions recently considered by Lyu and Simoncelli [28] . where the 
joint pdf equal density contour lines are nested hyperspheres with nonGaussian spacings. One might expect that 
with completely unbiased (spherical) input statistics no particular direction in weight space would be favored and 
therefore the outcomes would be either symmetric (equal weights), or broken symmetric (various combinations of 
opposite but equal magnitude weights); the particular set of outcomes would be determined by the higher-order 
correlations, and could be quite complicated. Indeed, Elliott [TS] finds that segregated outcomes are quite typical of 
nonlinear Hebbian rules with unbiased statistics and shows that crosstalk can induce bifurcations in these cases [17] . 

Recently, it has been suggested that the Oja rule (even without crosstalk) and Eigen's replication/mutation 
equation might be "isomorphic" [201119] . Indeed both equations describe normalized growth processes. However, our 
work shows that the Oja equation only shows a bifurcation at a critical crosstalk value in very narrow conditions. We 
suggest that the important analogy lies less in detailed mathematical equivalencies, and more in the fundamental need 
for accuracy in elementary biological processes. In particular, it's clear that superaccurate polynucleotide copying 
underlies Darwinian evolution, and similarly superaccurate Hebbian plasticity might be needed for neural learning. 

5.2 Effect of crosstalk on linear learning 

The analysis reported here essentially shows that the well known bifurcation that occurs in linear Hebbian learning as 
unbiased negative correlations become positive (from segregated to unsegregated states) still occurs in the presence of 
crosstalk, but at a new, crosstalk-dependent critical negative correlation level. This effect is quite intuitive: crosstalk 
favors the unsegregated state, and therefore allows the switch to occur at negative correlation, rather than at zero 
correlation. Of course this situation changes dramatically as soon as any degree of bias is introduced, since now the 
eigenvalues of C become distinct, and our previous analysis [36] applies : crosstalk produces a smooth change in the 
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direction of the learned weights (the dominant eigenvector of EC). Our present analysis attempts to characterize 
the relation between these two regimes. In particular, we show that the smooth change can be very rapid when bias 
is weak. 

The change in the normalization produced by crosstalk in the Oja model is largely irrelevant, and indeed one still 
sees the same behavior with explicit normalization (Section 3). Our analysis also gives insight into the codimcnsion 
two bifurcation that occurs at the critical quality q* , via an ellipse of half-stable equilibria. The motion towards the 
ellipse becomes extremely rapid (Figures 3B and 9) , which permits the exchange of stability between the 2 invariant 
lines (Figure [2]) . This rapid motion shows up in simulations (and presumably in biological realizations) as very 
"noisy" weights as the threshold crosstalk value is neared. 

Of course a true bifurcation is only seen for unbiased inputs and for negligible learning rates. However, the 
behavior remains practically indistinguishable from a bifurcation even with slightly biassed inputs and Unite learning 
rates. An example was already discussed in our previous paper (see Figure in [SS]). A similar situation occurs with 
models of phase transitions: a true bifurcation of the dynamics only occurs in the "thermodynamic limit," but this is 
effectively established even for quite small systems [39] . We have previously called attention to the analogy between 
Hebbian learning and molecular evolution [3] Qj|] , with crosstalk playing the role of mutation. In Eigen's evolution 
model [2], the transition from the ordered, living, state to the disordered, chemical, state is quite sharp even for 
polynucleotide lengths ~ 50, though a true phase transition (identical to that of the surface of the 2-dimensional 
Ising model) is only seen with unlimited chains [3T] . Interestingly, the model becomes easiest to analyze in this 
limit, and the relevant dimensionless control parameter is simply the product of the mutation rate and the (binary) 
chain length (for binary strings). Although we analyzed here the n = 2 case, we assume that weights are specified 
with unlimited bit resolution (i.e., reals). In this case the dimensionless control parameter, equivalent to that in the 
thermodynamic limit of the Eigen model, is q. In the standard Eigen model, the mutation rate is the same at all 
chain positions. In the next section we discuss the analogous concept for Hebbian learning. 

5.3 The error matrix 

Throughout this paper we assume that the Hebbian adjustment of any weight was equally affected by error, and 
does not depend either on the strength of that weight or its identity. Such "isotropicity" seems a reasonable first 
assumption, like neglecting bumps on an inclined plane in mechanics. However, it does appear to fly in the face 
of biological reality. First, stronger synapses are also bigger, and also require higher spine neck conductances and 
therefore presumably are less well isolated chemically ,26}. However, such expected "weight-dependence" might only 
be a second-order effect, because in order to ensure that LTP is "Hebbian," the spine neck resistance must always 
be sufficiently low, even at "silent," AMPAR-less zero-strength synapses, so that the essential back-propagating 
spike effectively invades the spine head [59] during the peak NMDAR opening. Second, crosstalk between individual 
synapses is a relatively local, not global, phenomenon 9, 23 . However, during learning individual synapses appear and 
disappear, which will smear details. Furthermore connections are made up of many individual synapses scattered 
over much of the dendritic tree, which will also smear detail |36| . Recent work shows that feed-forward cortical 
connections carrying similar information do not "cluster" on dendritic segments, invalidating the argument that 
local crosstalk could promote useful clustering [23] . 

5.4 Relevance to ocular dominance and general developmental mechanisms 

A useful though rather fuzzy distinction can be drawn between developmental mechanisms which generate sets of 
connections ("circuits"), or, perhaps, "incipient" or "potential" connections [2 HO] that can be made actual without 
axo-dendritic rewiring merely by adding postsynaptic spines or presynaptic "drumsticks" [5] [3S], and "learning," 
which refines (perhaps in crucial ways) the overall framework established by development. This distinction is related 
to that between "Nature" and "Nurture," or, in the context of Chomskyan linguisitics, "principles" and "parameters." 
The Oja model encapsulates this distinction in minimal form: by definition when the inputs are unbiased there can 
be no learning, and only two outcomes are possible, which we call segregated or unsegregated. The classic biological 
example is that in many species early in development a geniculate axon diffusely innervates a patch of layer IV 
of cortex (though it does not necessarily contact all the neurons whose dendrites ramify in that patch), but then 
retracts from stripes within that patch that become selectively innervated by axons corresponding carrying signals 
from the other eye. Cells within a stripe then becomes largely monocular, although they develop different selectivities 
for different stimulus features such as orientation. In the Oja model segregation appears in response to unbiased 
(or, effectively, nearly unbiased) inputs at a critical level of negative correlation, which depends on the degree of 
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crosstalk. In real animals segregation appears before the onset of visual experience, and is thought to be driven by 
unbiased inputs generated by spontaneous firing. While one might expect crosstalk to hinder segregation (since it 
tends to equalize weights), our results show this is not quite correct in the Oja model: it merely shifts the critical 
degree of (unbiased) correlation required. Various proposals exist for how such inputs can induce segregation even 
when correlations are positive [3H[TS] and it's likely that crosstalk will also have the same weak effect here. Indeed, 
Elliott [T7] has shown that while crosstalk induces a bifurcation from segregation to unsegregation in a weight- 
dependent model, the critical value (his equation 3.8) can be shifted to favorable values with suitable correlation 
values. Thus, the endogenous developmental machinery that creates circuits probably does not require great Hcbbian 
accuracy (and might not require Hebbian machinery at all [12l [35]. If the aforementioned postulated layer VI 
proofreading circuit [3] underlies accuracy, it would not be needed until learning begins, consistent with evidence 
that the final stages of layer VI circuitry (for example, feedback to relay cells) is late to develop. Indeed, much of 
the initial pruning that takes place in development might serve to improve the accuracy of proofreading circuitry 
essential for true learning. 

Once detailed, and biased, sensory input occurs, it can drive quantitative adjustments in the already correctly 
segregated circuits, involving both synapse-strength change and stabilization and un-silencing of new spines (and 
removal of weak synapses). However, even in the highly simplified Oja model, appropriate adjustment now requires 
great accuracy, and therefore presumably "proofreading," especially when correlation bias is weak. 

5.5 Normalization and error 

We have assumed in both our papers on the linear Hebb rule that the effect of crosstalk is solely on the Hebbian part 
of the rule, not the normalizing component. However, if normalization, or some other process that stabilizes Hebbian 
learning, is biologically necessary, then it is presumably also subject to imperfections such as crosstalk. There are 
basically two ways to add this other form of crosstalk to stabilized Hebbian learning rules. In the context of a single 
neuron model, one could simply apply a second, different, crosstalk matrix, say F, to the stabilizing term. However, 
if F = E (because the geometry underlying such errors is the same in each case) such normalization crosstalk cancels 
the overall effect. Even if the pattern of crosstalk at each update is described by fluctuating matrices £ and ZF, 
whose averages equal E and F, which do not exactly cancel, one might expect they would on average. Could this be 
a way to eliminate errors? 

The other possible way that errors could affect normalization would be if the underlying normalization mechanism 
were sufficiently different that the average geometry differed. Although the Oja model allows negative activities, firing 
rates can only be positive, and it is tempting to suppose that negative signals are carried in special "off" channels 
whose positive weights represent negative ones. In this equivalence, the Hebb part of the rule reflect LTP and the 
normalization part, LTD. In the cortex, LTP seems to be postsynaptic and LTD presynaptic. If the update leaks 
presynaptically, within the axon, it will affect a different set of synapses, ones that mostly form onto a different 
postsynaptic cell. To evaluate how such errors might affect learning, one needs a multi-unit model. 

6 Conclusion 

Generically the inspecific Oja rule does not show bifurcations with variation in the crosstalk parameter. In this paper 
we analyze an interesting special case which does show a bifurcation: when the input statistics are unbiased. We also 
describe the behavior in the vicinity of this special case, which is practically indistinguishable from a bifurcation. 
Essentially in this region "learning" changes rather abruptly from being dominated by second-order input statistics 
(at sufficiently low crosstalk) to being dominated by the internal pattern of crosstalk itself. However, we regard 
this behavior as being biologically rather uninteresting, since synaptic mechanisms are presumably accurate enough 
that it never occurs. The one exception would be during development, where near-unbiased statistics might be used 
by the brain to induce initial selective wiring. Our results suggest that even in this case, high Hebbian accuracy 
might be required. However, extreme accuracy is probably most essential for nonlinear learning from higher-order 
statistics [II I IP? ] . 
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Appendix 1. A few detailed proofs 



Lemma. Df% = 7 [EC - 2w(Cw) T - (w T Cw)l] 

Proof. Call gr(w) = (w T Cw)w , so / E (w) = 7 [ECw - g(w)} 



Hi? 3: 



If i = 3: 



So: 



5i(w) = (w T Cw)iiij 



J~( w ) = ^— (y < j C k iw k wi)wj = 2(^C kj w k )w i = 2[Cw] 

^ kA k 



tt^-(w) = 7r~(y^ C k iw k wi)Wi + ^2c k iw k w t = 2(y^ C ki W k )Wi+ 
Wl dwi k 



+w T Cw = 2[Cw]jU)i + w T Cw 
Dg w = 2w(Cw) T + (w T Cw)I 



□ 



Proposition 1.2. Suppose EC feas a multiplicity one largest eigenvalue. An equilibrium w (i.e., by Proposition 
1.1, an eigenvector o/EC with eigenvalue A W7 normalized so that ||u>||c = A Wy ) is a local hyperbolic attractor for the 
system (1) iff it is an eigenvector corresponding to the maximal eigenvalue of EC . 

Proof. Fix an eigenvector w of EC, with ECw = A w w. Then: 



Df^w = 7 [ECw - 2w(Cw) T w - (w T Cw)w] = 
= "/[— 2ww T Cw] = — 27A w w 

Recall that the vector w can be completed to a basis B of eigenvectors, orthogonal with respect to the dot product 
(■, -)c- Let v e B, v 7^ w, be any other arbitrary vector in this basis, so that ECv = A v v, and (w, v)c = w'Cv = 0. 
We calculate: 



Z?/w v = 7[ECv - 2ww T Cv - A w v] = 

= 7[(A V - A w )v - 2(w, v) c w] = -7[A W - A v ]v 

So B is also a basis of eigenvectors for Df®. The corresponding eigenvalues are — 27 A w (for the eigenvector w) and 
— 7[A W — A v ] (for any other eigenvector v £ B, ,v/w). An equivalent condition for w to be a hyperbolic attractor 
for the system is that all the eigenvalues of Df^ are < 0. Since the learning rate 7 and the eigenvalue A w are 
both > 0, this condition is further equivalent to having — 7(A W — A v )| < , for all v g B , v 7^ w. In conclusion, an 
equilibrium w is a hyperbolic attractor if and only if A w > A v , for all v 7^ w (i.e. A w is the maximal eigenvalue, 
or in other words if w is in the direction of the principal eigenvector of EC). □ 
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Appendix 2. An extension to higher dimensions 



Theorem. Suppose the the modified covariance matrix EC has a unique maximal eigenvalue X\. Then the two 
eigenvectors ±wec corresponding to Ai, normalized such that ||w||c = Ai, are the only two attractors of the system. 
More precisely, the phase space is divided into two basins of attraction, of wec o,nd — wec respectively, separated by 
the subspace (w.wec) = 0. 



Proof. We perform the change of variable u = VCw, so that u u = w Cu. Notice that vC is also a symmetric 
matrix, and that w = y/C u; the system then becomes: 



C 1 ii = ECVC \i-(uVc 1 CVC 1 u)Vc : u = E\/Cu - (u*u)\/C 1 \i 



or equivalently: 



u = \/CE%/Cu - (u*u)u = Au - (u*u)u (7) 

where, of course, we defined A = vCEyC- Clearly, A a symmetric matrix, having the same eigenvalues as EC. 
More precisely, w is an eigenvector of EC with eigenvalue \x iff v'Cv is an eigenvector of A with eigenvalue \i. More- 
over: any two distinct eigenvectors v/wof EC are known to be orthogonal, hence any two distinct eigenvectors of 
A are orthogonal in the regular Euclidean dot product: (\/Cv) t (v / Cw) = v'vCvCw = v*Cw = 0. 

Consider then v to be the principal component of A (i.e., the eigenvector corresponding to its maximal eigenvalue), 
and let u = u(t) be a trajectory of the system |7]). We want to observe the evolution in time of the angle between 
the variable vector u and the fixed vector v. 



cos 9 



We differentiate and obtain: 

- || v || sin(e)e = 

The numerator of this expression 



/ -\ ii ii / \ u ' u 
V, u • u - (v, u „ „ 



(v«u)||u|| 2 - (v'uXu'u) (8) 



h(u) = (v*u)(u*u) - (v*u)(u*u) = (u*u)rv*[Au- (u*u)u]) + (u*v) - (v*u)(u* - [Au- (u*u)u 

= (u*u)(v*Au) - (v*u)(u*Au) 

We are interested in the sign of h(u); to make our computations simpler, we can diagonalize A in a basis of orthogonal 
eigenvectors A = P*DP, where D is the diagonal matrix of eigenvalues and P is an orthogonal matrix whose columns 
are the eigenvectors. Then: 

h(u) = [(Pu t )(Pu)][(Pv t )D(Pu)]-[(Pv t )(Pu)][(Pu*)D(Pu)] 
= (z t z)(y t Dz)-(y t Z )(z t D Z ) 

where y = Pv and z = Pu, so that Dy = DPv = Aiy (where Ai > A2 > . . . > A„ is the largest eigenvalue of EC, 
assumed to have multiplicity one. Hence: 

h(u) = ( Z * Z )(y*Dz) - (y t z)(z t Dz) - (z t z)A 1 (y*z) - (y t z)(z*Dz) = ( y *z)[A 1 (y t z) - z'Dz] 



(y*.) [ax E 4 - E x rf] = (y tz ) [E - 



Hence, if y*z > 0, then h(u) > 0. In other words: if v*u > then — ||v|| sm(6)9 > 0, hence that 9 < 0. For our 
original system, this means that any trajectory starting at a w with (w, wec) > converges in time towards the 
principal eigenvector wec of the matrix EC. 

□ 
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Appendix 3. Sensitivity analysis 

This is a technical section, in which we calculate how the invariant directions z\ t 2 change when varying q. 
Remark. In order to simplify further computations, we rewrite: 



21,2 = 



-qS±VA _ -q5 ± ^q 2 5 2 + 4/3[/3 + (1 - q)6] 



2/9 



2/3 



Call 7 = 



where 



qS qd 

(3 cq + (1 - q)v 



1 fl 5 \ + 1 - i a\ fl S Y , 4[/3 + (1 - g)*] 

_,, (1 — q)5 5 — cy 
. I hen = , and hence: 



/3 v 

Zl,2 = - 



)p± ^sign(/?)^ 



7 2 + 4 



1 + 



S — C7 



Then we can use the chain rule to express 



dzi t 2 _ dzi t 2 d<y 
dq dj dq 



Lemma. The derivative 



dj Sv 
dq^JP 



. Also, for q E (1/2, q*) U (q* , 1] (i.e., where f3 ^ 0), we have: 
dz\ t 2 _ 1 



dj 



< 



p f <*y = d (« s \ = W - = s $ - g( c - *)] = Sv 

r °° ' dq dq\P J /3 2 ' P 2 f3 2 

For q e (1/2, q*) U (<?*, 1], we also have directly that: 



dzifi 
d-f 



Since \-f 2 + 4 



1 + 



(5 — C7 



2 2 ^ 



± 



A/|/3| 



2c 



_ 7 _ 



~ ± ^sign(/3)- 



7" 



2c 



/ 7 2 + 4 



1 + 



8 — C7 



1 ' ,<_ ?) 



- 1 



4[w(u + 5) - c 2 



> 0, it follows that 



/ 7 2 + 4 



1 + 



5 — C7 



i 2c. / 2c 
>l7"-|>± 7- V 



and hence 
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sign/3 [ 7 - — 
v 



< 1 



5 — C7 



It immediately follows from (4) that - ' < 0. 

Corollary. The slope of the invariant lines changes with respect to q according to: 

2cf3 s 



□ 



dzi t 2 Sv 



±(qS 



hence sign 



dz 1>2 
dq 



dq 2/3 2 
-sign(J) for q€ (1/2, q*)U(q*,l}. 



- 1 



Proof. The conclusion follows directly from the chain rule that ^ Zl ' 2 = ^ . ^ Zl - 2 _ 

dq p z cry 



□ 



At this stage, we can distinguish two cases: 6 < and S > 0. Wc analyze in detail the case S > 0. The other 
is very similar (although not symmetric about 5 — 0), and we will only state the results, and show some graphic 
illustrations. 



Proposition 2.6. If S > 0, then 



dz\ t 2 
dq 



< 0; hence both Z\ t 2 are decreasing as q € (1/2, g*) U (q* , 1]. Furthe 



the monotonicity, asymptotes and end behavior of the functions 21,2(3) are sketched in the following table: 



q 


1/2 


q* 


1 


Z\ 


1 




6 - V4c 2 + S 2 




* -2c 


Z2 




\ loo \ 
\i —00 


S + V4c 2 + S 2 



2{v 



2c 



Remark 1. In the system's phase plane, this corresponds to a continuous clockwise rotation of the two invariant 
lines (the vertical asymptote at q* corresponds to the z 2 line going through a the vertical position). A phase-plane 
sketch of this process is shown in Figure 2 in the main text, and the graphs of the actual functions zi, 2 (q) and of 
their derivatives dzi y2 /dq, for some fixed values of the parameters v, c, 5 > 0, are shown in the Figure below. 

Remark 2. Clearly from the table, the angular position of the two equilibria at q = q* does not depend on the 
bias S. It can be easily shown that the norm of these points is also independent on 5. For example, the norm of the 
stable equilibrium is: 



m(z 2 + i) 



(l-q*)c + q*(v + S) 



vz\ + 2czi + (v + 5) v(l - q*) 2 + 2c<f (1 - q*) + (v + S)q* 
(l-q*)c + q*(v + S) , l-2q*+2q* 2 

q \ z \ + 1) = - 



q* (*i + 1) 



(ii) 

(12) 



q*[(l - q*)c + q*(v + 6)] ' " q r 

Hence the position of the two equilibria at critical quality does is the same for all bias values 5 > 0. 

Proof. The monotonicity follows from the Corollary. The limit values follow from direct computation. For example: 

r r -QS-VA -q*5 

hm z 2 = hm = hm — — = +00 

q^rq* + q^q" + 2/3 q^q* + 
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2l(?) 



0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 

Quality q 




0.55 0.6 0.65 7 75 8 85 9 0.95 

Quality q 



Figure: 

Slopes of invariant lines (A) and their change as q is varied (B). In both panels, the other parameters 
values were fixed to v = 1, c = —0.4 and 8 = 0.2. Notice that, in accordance with Proposition 2.6, z\ and its 

derivative dz\/dq are continuous (blue curves) on [1/2,1], while zi and its derivative dzijdq (green curves) have 

v 

vertical asymptotes at q = q* = ~ 0.71. 



lim z-i 

q^q*_ 



q*S 



lim 

q^qt 0+ 



,. -qS + VA -q8-VA p + (I - q)5 1 - q* 
hm z\ = am • p= = lim == — = 

q^q* q^q' 2/3 -q$ - sj A 9^9* qS + V A <7* 



□ 



Remark 3. If S < 0, then 



dzi : _ 
dq 



> and hence both z\ and z% are increasing as q € (1/2, q*) U (q* , 1]. In the system's 



phase plane, this corresponds to a continuous counter-clockwise rotation of the two invariant lines. 

Proposition 2.7. For 5 > 0, the angle 812 G [— 7r/2,7r/2] between each invariant line and the W\ abscissa is 
decreasing with respect to the parameter q. Moreover, the angular rate of change is finite, at all q G (1/2, 1]. 



q 


1 

2 


q* 


1 


dd x jdq 


(-) 


det(C) 
vS(l-2*q + 2q 2 ) 


(-) 


d6 2 /dq 


(-) 


V 

~qH 


(-) 


Proof. The relation between the slope z and the actual angle 8 is given by: z 
wherever there is no danger of confusion). Hence, for q £ (1/2, q*) U (q* , 1], we have: 



cos 2 (60 



d,8 



d8 dz dz 1 
dj dj dj z 2 + 1 



So 



dd d-/ d6 Sv dz 1 



dq dq d-y 2(3 2 dj z 2 + 1 



(13) 



hence sign 



-sign(<5), for all q 6(1/2, q*)U(q*,l}. 

d'y 



We yet have to check that the rate of change — - remains finite (i.e., does not blow up to —00) as q 

dq 

Elaborating on (6) we have, for q £ (1/2, q*) U (q* , 1]: 



hm — — 

q->-q* dq 



lim 



Sv 
2/32 
2Sv -q*6 
q~*S 



qS- 



2c» 



-VA 



2q* 2 S 2 + 2q* 2 5* 



4/3 2 



A + 4f3 2 - 2q6V~K 



(14) 
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We also notice that 



and that 



llm » Z l = — =^ llm » Z l + 1 = Z~2 



,. dzi ,. 5v \ v J Sv \ v J \ v 
km = lim — - • — = hm — - ■ — -=k • —. — 

q^q' dq q^q* 2(3 2 y/A Q^Q* 2/3 2 y/A ( ^ 2 C /3 



V 



hm == • — r „ „. = lim 



-4^ 2 det(C) 1 



lim 

^<?*2/3 2 \/A v 2 f q6 _ 2c P\ 



v J 

Sv -4^ 2 det(C) 1 -det(C) 



2(3 2 q*5 v 2 2q*S v5q* 2 

Combining (9) and (10), we have: 

d0 2 _ dzt 1 -det(C) q* 2 -det(C) 



dq dq z 2 + l vSq* 2 l~2q*+2q* 2 v5(l - 2q* + 2q* 2 ) 
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Appendix 4. Description of the ellipse attractor 



For unbiased inputs S = and critical quality q = q* , EC has a double eigenvalue fi = v + c = (2q* — l)(v — c). The 
cigenspace of EC is R 2 , hence each direction (i.e. slope z — tan0 e [—00, +00]) produces two equilibria, normalized 
as follows: 



w = 



p(z 2 + l) (v + c) [tan 2 + 1] 



v + c 



«z 2 + 2cz + v wtan 2 + 2c tan + v v sin 2 9 + 2c sin cos + w cos 2 9 
V + c 



v + csin(20) 



(18) 



We show that this is the polar equation of an ellipse with foci along the first diagonal 9 — n /A. Indeed, under a 
clockwise rotation by — tt/A, equation (13) becomes: 



2 V + C V + C f + C 

P ~ w + csin(2 [0- f]) ~~ w + ccos(20) ~ „ [cos 2 9 + sin 2 0] + c [cos 2 - sin 2 6] 



V + c 



u + c 



(v + c) cos 2 + (v — c) sin 2 \Jv 2 — c 2 (v + c) cos 2 + (w — c) sin 2 



(19) 



In polar coordinates, this is the equation of an ellipse 



P 2 = 



2 h 2 



a 2 b 



a 2 cos 2 + b 2 sin 



with radial coordinate p = ||w| 
radius 6 = ^A; — c. 



\/w 2 — c 2 
w + c 



and angular coordinate 0, semi-major radius a = \/v~+~c and semi-minor 
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